Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting by sub-aggregation is broken in terms agg #4643

Closed
uboness opened this issue Jan 7, 2014 · 11 comments
Closed

Sorting by sub-aggregation is broken in terms agg #4643

uboness opened this issue Jan 7, 2014 · 11 comments
Assignees

Comments

@uboness
Copy link
Contributor

uboness commented Jan 7, 2014

Terms aggregations enables sorting by sub metric aggregation, a la:

{
    "aggregations": {
        "host": {
            "terms": {
                "field": "host",
                "order": {
                    "cpu_avg": "asc"
                }
            },
            "aggs": {
                "cpu_avg": {
                    "avg": {
                        "field": "cpu"
                    }
                }
            }
        }
    }
}

unfortunately, that got broken along the way.

The fix will be mostly based on #4472

thx @alexbrasetvik

@ghost ghost assigned uboness Jan 9, 2014
uboness added a commit that referenced this issue Jan 9, 2014
…regator to enable access to metrics values without creating buckets

fixes: #4643
@uboness
Copy link
Contributor Author

uboness commented Jan 9, 2014

fixed

@uboness uboness closed this as completed Jan 9, 2014
brusic pushed a commit to brusic/elasticsearch that referenced this issue Jan 19, 2014
…regator to enable access to metrics values without creating buckets

fixes: elastic#4643
@TrevRUNDSP
Copy link

Hi all,
I'm using the below code to try and grab the terms that have the greatest sums of field "impression":

{
"size":0,
"query":{
"filtered":{
"query":{"match_all":{}},
"filter":{
"and":{
"filters":[{"range":{"date_time":{"from":"2013-12-12T00:00:00Z","to":"2013-12-12T23:59:59Z"}}}]
}
}
}
},
"aggs":{
"XXXXXXXXX" : {
"terms" : {
"field" : "XXXXXXXXX.untouched",
"size" : 10,
"order" : { "Impression_summation" : "desc" }
},
"aggs" : {
"Impression_summation" : { "sum" : { "field" : "impression" } }
}
}
}
}}
'

While this code does order by the sums of that field, it is omitting many terms that would have even greater sums. It is only after I increase the size parameter to something pretty high (10 --> 1000 in this case), does it then 'catch' the terms with the largest summations of field "impression". Is there other code I should be using for this or is this a bug? Thanks!

  • Trev

@alexbrasetvik
Copy link
Contributor

Hi Trev,

You probably want to increase the shard_size.

Here's the relevant section in the docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/search-aggregations-bucket-terms-aggregation.html#_size_amp_shard_size

– Alex

@TrevRUNDSP
Copy link

Excellent! I'll give that a whirl. Thanks for the help (and fast response)!

  • Trev

@benatwork99
Copy link

I'm using 1.0.0-RC1 and this doesn't appear to be working correctly for me for a metric sub-aggregation.

If I use the following query:

{
  "aggs": {
    "fieldValues": {
      "terms": {
        "field": "name.field",
        "order": {
          "totalAmount": "desc"
        }
      },
      "aggs": {
        "totalAmount": {
          "sum": {
            "field": "amount"
          }
        }
      }
    }
  },
  "size": 0
}

It appears to be sorted in 'asc' mode, and changing 'desc' to 'asc' produces the same results. Other than that the query does appear to work successfully and returns the expected sort of data in the buckets.

Changing the order key from 'totalAmount' to '_term' works as expected, and 'asc' and 'desc' show different results.

I've looked at the code for 1.0.0-RC1 and the above changes do appear to be in, was it working for you on RC1 Trev?

@uboness
Copy link
Contributor Author

uboness commented Jan 30, 2014

@benatwork99 I just indexed a few documents that fit the aggs you have above, and it seem to work as expected. Are you sure you're running it on RC1?

here's what I ran (save it to a file test_aggs.sh):

curl -XDELETE 'http://localhost:9200/idx?pretty'; echo;

for i in {1..3}
do
  curl -XPOST 'http://localhost:9200/idx/type&pretty' -d '{
    "name" : {
      "field" : "val1"
    },
    "amount" : '$i'
  }'; echo;
done

for i in {1..2}
do
  curl -XPOST 'http://localhost:9200/idx/type&pretty' -d '{
    "name" : {
      "field" : "val2"
    },
    "amount" : '$i'
  }'; echo;
done


sleep 3s; ehco;

dir=$1

echo 'executing search ('$dir')';

curl -XGET 'http://localhost:9200/idx/_search?search_type=count&pretty' -d '{
  "aggs": {
    "fieldValues": {
      "terms": {
        "field": "name.field",
        "order": {
          "totalAmount": "'$dir'"
        }
      },
      "aggs": {
        "totalAmount": {
          "sum": {
            "field": "amount"
          }
        }
      }
    }
  }
}'; echo;

desc order : ./test_aggs.sh desc
asc order: ./test_aggs.sh asc

@benatwork99
Copy link

Thank you very much for the detailed reply uboness. If I download the vanilla RC1 distribution and run your query against it, I can confirm that it works as intended. Our version is part of a larger build and configuration process, it's definitely RC1 but there's obviously something else - distribution or configuration - affecting it.

I'm looking in to this now trying to understand what's different, will post back as soon as I can figure it out, and hopefully this won't have been just a waste of time for you!

@uboness
Copy link
Contributor Author

uboness commented Jan 30, 2014

@benatwork99 ok, cool, keep us posted (and it's definitely not a waste of our time... we're here to make it work :))

@benatwork99
Copy link

Aha! Ok turned out our test and dev setups use a single shard - setting

index.number_of_shards: 1

in elasticsearch.yml.

With this set to 1, I see ordering by the metric sub aggregate always sorting in ascending order! Leaving it blank (or setting it to 2 or higher) means that the descending order works as expected.

So, I think that's a bug. At least I wasn't doing something daft ;) Do you want me to create a separate issue on it?

@uboness
Copy link
Contributor Author

uboness commented Jan 30, 2014

@benatwork99 yep.. indeed a bug... opened an issue for it, will be fixed shortly. Thanks! (see.. no waste of time :))

@andrewreedy
Copy link

@uboness I'm seeing an issue with sorting by sub-agg when two of the values equal each other.. for example: Doc 1 and 2 work fine but with Doc 3 it will throw NPE.

Mapping

{
    "bid" : {
        "properties" : {
            "bid_price" : {
                "type" : "float"   
            }, 
           "company_id" : {
                "type" : "string",
                "index" : "not_analyzed"
            },
           "bid_id" : {
                "type" : "string",
                "index" : "not_analyzed"
            }
        }
    }
}

Doc 1

{
    "bid_price" : 7.0, 
    "company_id" : "7",
    "bid_id": "11"
}

Doc 2

{
    "bid_price" : 6.0, 
    "company_id" : "6",
    "bid_id": "11"
}

Doc 3

{
    "bid_price" : 7.0, 
    "company_id" : "6",
    "bid_id": "11"
}

Agg

  "aggs" : {
        "company_id" : {
            "terms" : { 
                "field" : "company_id",
                "order": {
                    "max_bid": "desc"
                }
            },
            "aggs" : {
                "bid_id" : {
                    "terms" : { 
                        "field" : "bid_id",
                        "order": {
                            "company_max_bid": "desc"
                        },
                        "size"  : 1
                    },
                    "aggs" : {
                        "company_max_bid": {
                            "max": {
                                "field": "bid_price"
                            }
                        }   
                    }
                },
                "max_bid": {
                    "max": {
                        "field": "bid_price"
                    }
                }
            }
        }
    }

Result

   "_shards": {
      "total": 5,
      "successful": 4,
      "failed": 1,
      "failures": [
         {
            "index": "bids_02-03-14",
            "shard": 0,
            "status": 500,
            "reason": "NullPointerException[null]"
         }
      ]
   }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants